SarsaLandmark: an algorithm for learning in POMDPs with landmarks

نویسندگان

Michael R. James

Satinder P. Singh

چکیده

Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in partially observable Markov decision processes (POMDPs). Nevertheless, one can construct counterexamples, problems in which Sarsa(λ < 1 ) fails to find a good policy even though one exists. Despite this, these algorithms remain of great interest because alternative approaches to learning in POMDPs based on approximating belief-states do not scale. In this paper we present SarsaLandmark, an algorithm for learning in POMDPs with ”landmark” states (most man-made and many natural environments have landmarks). SarsaLandmark simultaneously preserves the advantages offered by eligibility traces and fixes the cause of the failure of Sarsa(λ) on the motivating counterexamples. We present a theoretical analysis of SarsaLandmark for the policy evaluation problem and present empirical results on a few learning control problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Shortest Path with Energy Constraints in POMDPs

We extend the traditional framework of POMDPs to model resource consumption inducing a hard constraint on the behaviour of the model. Resource levels increase and decrease with transitions, and the hard constraint requires that the level remains positive in all steps. We present an algorithm for solving POMDPs with resource levels, developing on existing POMDP solvers. Our second contribution i...

متن کامل

Learning Sorting and Decision Trees with POMDPs

pomdps are general models of sequential decisions in which both actions and observations can be probabilistic. Many problems of interest can be formulated as pomdps, yet the use of pomdps has been limited by the lack of eeective algorithms. Recently this has started to change and a number of problems such as robot navigation and planning are beginning to be formulated and solved as pomdps. The ...

متن کامل

A Cultural Algorithm for POMDPs from Stochastic Inventory Control

Reinforcement Learning algorithms such as SARSA with an eligibility trace, and Evolutionary Computation methods such as genetic algorithms, are competing approaches to solving Partially Observable Markov Decision Processes (POMDPs) which occur in many fields of Artificial Intelligence. A powerful form of evolutionary algorithm that has not previously been applied to POMDPs is the cultural algor...

متن کامل

Learning the Reward Model of Dialogue POMDPs from Data

Spoken language communication between human and machines has become a challenge in research and technology. In particular, enabling the health care robots with spoken language interface is of great attention. Recently due to uncertainty characterizing dialogues, there has been interest for modelling the dialogue manager of spoken dialogue systems using Partially Observable Markov Decision Proce...

متن کامل

Representing hierarchical POMDPs as DBNs for multi-scale map learning

We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). We use this model for representing and learning multi-resolution spatial maps for indoor robot navigation. Our results show that a DBN representation of H-POMDPs can train significantly faster than the original learning algorithm for H-POMDPs or t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

SarsaLandmark: an algorithm for learning in POMDPs with landmarks

نویسندگان

چکیده

منابع مشابه

Stochastic Shortest Path with Energy Constraints in POMDPs

Learning Sorting and Decision Trees with POMDPs

A Cultural Algorithm for POMDPs from Stochastic Inventory Control

Learning the Reward Model of Dialogue POMDPs from Data

Representing hierarchical POMDPs as DBNs for multi-scale map learning

عنوان ژورنال:

اشتراک گذاری